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Abstract. We analyse the complexity of computing class polynomials, that 
are an important ingredient for CM constructions of elliptic curves, via com- 
plex floating point approximations of their roots. The heart of the algorithm 
is the evaluation of modular functions in several arguments. The fastest one 
of the presented approaches uses a technique devised by Dupont to evalu- 
ate modular functions by Newton iterations on an expression involving the 
arithmetic-geometric mean. Under the heuristic assumption, justified by ex- 
periments, that the correctness of the result is not perturbed by rounding 
errors, the algorithm runs in time 

O (y\D\ log 3 \D\ M (y/\D\ log 2 \D\X\ C O (\D\ log 6+e \D\) C O (h 2+e ) 

for any e > 0, where D is the CM discriminant, h is the degree of the class 
polynomial and M(n) is the time needed to multiply two n-bit numbers. Up 
to logarithmic factors, this running time matches the size of the constructed 
polynomials. The estimate also relies on a new result concerning the complex- 
ity of enumerating the class group of an imaginary quadratic order and on a 
rigorously proven upper bound for the height of class polynomials. 



1. Motivation and results 

The theory of complex multiplication yields an efficient approach to the con- 
struction of elliptic curves over a finite field having a given endomorphism ring of 
discriminant D, as long as \D\ is not too large. Exploiting the link between the 
endomorphism ring of an elliptic curve and its number of points, it is possible to 
efficiently obtain curves with specific properties. Applications include primality 
proving [3], the construction of classical, discrete logarithm based elliptic curve 
cryptosystems and of identity based cryptosystems [361 03 E] • 

The classical approach to effective complex multiplication is to compute minimal 
polynomials of special, algebraic values of modular functions on the upper complex 
halfplane (more details are provided in Section^. The complexity of this algorithm 
remains a shady issue; one of the reasons a serious analysis has not been undertaken 
so far are the purported numerical difficulties with the computations with complex 
floating point numbers, an issue that actually does not arise in practice (cf. the 
short discussion at the end of Section [7]). 

In [13j the authors present an algebraic generalisation of the complex multi- 
plication algorithms to p-adic fields, that yields the same minimal polynomials. 
Besides ruling out any possible numerical instabilities, they obtain a complexity of 
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0(|Z)| 1+e ), which is better than a straightforward implementation of the complex 
approach, and asymptotically optimal (up to logarithmic factors) due to the size of 
the constructed objects. 

The present article thus has two goals. First, it provides an accurate account of 
the complexity of different algorithms for class polynomial computation via floating 
point approximations. Second, it shows that asymptotically optimal algorithms 
exist also in the complex setting, with a complexity that is linear (up to logarithmic 
factors) in the output size. These new algorithms are presented in Sections 16.31 and 
16.41 and the faster one achieves the following: 

Theorem 1.1. Let f be a fixed modular function that is a class invariant for a 
family of discriminants D of class numbers h — h(D). Then the algorithm of 
Section \6.4\ which computes a floating point approximation to the class polynomial 
for f , runs in time 



when executed with complex floating point numbers ofn = n(D) bits precision, where 
M(n) is the time needed to multiply two such numbers as detailed in Section \3.1\ 

The floating point precision n required to carry out the computations is clearly 
bounded from below by the height of the class polynomial. The following theorem 
provides a rigorously proven upper bound on the heights, that is close to experi- 
mental findings. A bound of the same shape is given in [IJ §3.3] as a heuristic, and 
the gist of the proof appears already in [341 §5.10] and [Jl] Section 2]. 

Theorem 1.2. The logarithmic height of the class polynomial for j for the dis- 
criminant D of class number h is bounded above by 



18.587 ...,c 3 = 17.442 . . ., c 4 = 11.594 ...,c 5 = 3.011 ... and c 6 = 2.566 . . .. 

The asymptotic upper bound of 



holds for any other class invariant as well. 

In practice, one observes that rounding errors do not disturb the result. It 
suffices to take n as an approximation of the height plus a few guard digits to be 
able to round the floating point approximation of the class polynomial to the correct 
polynomial with integral coefficients. A rigorous error analysis, however, appears 
to be out of reach. So Theorems 11.11 and 11.21 can be brought together only in the 
form of a heuristic, assuming that the computations with floating point numbers of 
n bits yield an approximation of the class polynomial that is correct on essentially 
n bits. 



Corollary 1.3 (heuristic). Taking n € O I y/\D\ log 2 |D| J in Theorem and 



using the bound on the class number h € O I ^/]Z)| log \D\ ) proved at the end of 



O (h(\og 2 h + log n)M(n)) 




O 



(v^Dllog 2 !^) 
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Section^ the algorithm of Section \6.4\ computes the class polynomial for f in time 

O (^\D\\oi \D\ M (^log 2 \D\)) C O {\D\ log 6+£ \D\) C O (h 2+e ) 
for any e > 0. 

Notice that up to logarithmic factors, this complexity corresponds to the output 
size of the algorithm, namely the size of the class polynomials. Notice also that 
the correctness of the output can be verified by a probabilistic, Monte-Carlo type 
algorithm; namely one may check that the reductions of the class polynomial mod- 
ulo sufficiently many suitable primes p yield elliptic curves over F p with complex 
multiplication by Od- Indeed, the main application of class polynomials is to com- 
pute elliptic curves over finite fields with given complex multiplication, and in this 
situation it can be verified independently that the curves are correct. 

As an ingredient for the proof of Theorem ll.ll we obtain in Section[5]thc following 
result for computing class groups of imaginary quadratic orders: 

Theorem 1.4. The class group of the imaginary- quadratic order of discriminant 
D and class number h can be enumerated 

• unconditionally by a probabilistic algorithm in time 



O (y |£>| log \D\ log log |2?|M(log \D\) j 
• under GRH by a deterministic algorithm in time 
0{h\og\og\D\M{\og\D\)). 
Again the algorithm is essentially optimal in its output size. 

2. Complex multiplication and class polynomials 

2.1. The basic approach. For proofs of the following facts on complex multipli- 
cation of elliptic curves see, for instance, [14]. 

Let us first consider the situation over the complex numbers. Let D < be 

an imaginary quadratic discriminant, and Od = L D+ ^/~® the (not necessarily 

L z Jz 

maximal) order of discriminant D in K = Q(\/D). The ideal class number of 
O d is denoted by h = h D . By Siegel's theorem ;42 , Mr -> \ (\D\ -> oo), so 
that \D\ £ 0(h 2+£ ) and h £ 0(\D\ 1 / 2+e ) for any e > 0. There are h isomorphism 
classes of elliptic curves over C having complex multiplication by Od, that is, curves 
with Od as endomorphism ring. Namely, let j : H = {z £ C : 'S(z) > 0} — » C 
denote the absolute modular invariant, and let r, = ^ B ^j^ nm through the 
roots in H of the reduced quadratic forms [Ai, Bi,Ci] — AiX 2 + BiX + Ci of 
discriminant D — B 2 — AAiCi, representing the ideal classes of Od', then the j- 
invariants of the elliptic curves are given by the jiji). Moreover, these j(rj) are 
algebraic integers; in fact, they generate the so-called ring class field Kd for Od, 
the Galois extension of K whose Galois group is isomorphic to the class group of 
Od (the isomorphism being given by the Artin map). The minimal polynomial of 
the over K, Hd{X) = Yl i=1 (X — j(ri)), has in fact coefficients in Z and is 
called a class polynomial. In the special case that D is a fundamental discriminant, 
the ring class field Kd is also called the Hilbert class field of K. 

Let now ¥ q = ¥ p ™ be a finite field of characteristic p. Suppose that p splits in 
K = Q{VD) and that p j D; then p is unramified in Kd- If furthermore q may be 
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written as 4q = U 2 + DV 2 with U, V £ Z, then the inertia degree of the prime 
ideals above p in Kd divides m. The reductions of the complex elliptic curves with 
complex multiplication by Od modulo any of these prime ideals thus live in ¥ q . By 
Deuring's reduction and lifting theorems 16, Einleitung, par. 5], these h curves are 
precisely the elliptic curves over ¥ q with complex multiplication by Od- They may 
be obtained as follows: Compute the class polynomial Hd £ ZLY] and reduce it 
modulo p. It splits completely over ¥ q , and each of its roots is the j-invariant of an 
elliptic curve over ¥ q with the desired endomorphism ring. 

2.2. Class invariants. Unfortunately, Hd has very large coefficients (see the dis- 
cussion in Section [4]), so that its computation requires a high precision to be accu- 
rate. In practice, one may often gain a constant factor for the required number of 
digits by using instead of j modular functions / that are invariant under T°(N) 
a b 



£ S1 2 (Z) : N\b\ for some positive integer N; that is, / (ff±f) = f(z) 
for any such matrix \ ^ j . Under suitable conditions on the discriminant D and 



c d , 

suitable normalisations of the t^, derived from Shimura's reciprocity law, the singu- 
lar values /(rj) are still elements of the class field Kd [3ZI E01 13H I3H] ; we then call / 
a class invariant and the minimal polynomial HD[f](X) — Y['i=i{X ~~ /( r 0) again 
a class polynomial. The first such class invariants are given by Weber's functions f, 
fi, hi 72 and 73 [43]. 

Two parameterised families of class invariants are exhibited in |23j , where ./V is 
the product of two primes and Hd[J] £ Z[X], and [22], where N is prime and Hd[J\ 
has coefficients in the maximal order of Q(\/Z>). Again, these class polynomials 
split completely after reduction in F g . The corresponding elliptic curves may be 
recovered from some polynomial relationship between / and j: If the modular curve 
Xq(N) has genus 0, then j is sometimes given by a rational formula in /; otherwise 
it has been suggested in [53] to look for a root of the modular polynomial <£>(/, j) 
after reducing modulo p and specialising in the value found for / in ¥ q . (Both cases 
require that all coefficients be rational to make sense modulo p, which holds for all 
exhibited class invariants.) 



3. Complexity of arithmetics 

This section discusses the well-known complexity of the basic multiprecision and 
polynomial arithmetic underlying the computations. 

3.1. Multiprecision floating point arithmetic. Let M{n) be the bit complexity 
of multiplying two n-bit integers. Using Schonhage-Strassen multiplication [55] . 
one has M(n) £ 0{n log n log log n). With Fiirer's algorithm [26], one has M(n) £ 
nlogn2 ( log n \ where log* n is the number of times the logarithm function has 
to be applied to n before the result drops below 1. So with either algorithm, 
M (n) £ O (n log 1+£ n) for any e > 0. 

The four basic arithmetic operations and the square roots of real floating point 
numbers of precision n have a complexity of 0(M{n)), the inversion and square 
roots being realised by Newton iterations as explained in Lemmata 2.2 and 2.3 of 
[B]. The same article shows that exp, sin and the constant 7r can be computed in 
0(lognM(n)). 
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Thus, the four basic operations on complex floating point numbers of precision n 

can be executed in time 0{M(n)). Letting c = \J a +V<^± M. f or a > 0, one obtains 

V a + bi — c + w- in time 0(M(n)). Finally, the complex exponential may be 
reduced to the real exponential and the real sine and cosine functions and is thus 
computed with complexity 0(logn M(n)). 

3.2. Polynomial arithmetic. Concerning operations with polynomials, we as- 
sume that a floating point precision of n bits has been fixed and is the same for 
all input and output polynomials. Multiplying two polynomials of degree d over 
C by the FFT takes 0{d log d) multiplications in C, whence it has a complexity of 
0(d\ogd M(n)) once the necessary roots of unity have been computed. 

The following algorithm ( |27l Algorithm 10.3]) obtains a monic polynomial of 
degree h from its roots by organising the computations in a binary tree. 

Algorithm 3.1 (Poly_f rom_roots). 
Input: h G N and x\, . . . , Xh G C 

Output: binary tree T k ,i = (X - z^-i^+i) ■ • • (X - x min ( i2 k >h )) 
fork = 0,...,t= \log 2 h-}, i = l,...,2 t ~ k ; 
in particular, the root T ti \ = (X — x±) • •• (X — x^) 

(1) fori = ltoh 

T 0j4 <— X - Xi 

(2) for i = h + 1 to 2* 

T , 4 «- 1 

(3) fork = ltot 

fori = l to 2 t ~ k 

Tk,i <— Tfc_ lj2 i-l • 2fe— l,2i 

(4) return T 

If the needed roots of unity are precomputed, the algorithm has a complexity of 
0(h log 2 h M(n)). Notice that the roots of unity of order 2 ri°S2 ^1 suffice to carry 
out all the FFTs in Algorithm 13. II By Section |3~T1 computing a primitive root of 
unity takes O (log nM(n)), all others can be obtained by successive multiplications 
in 0(h M(n)). Thus, the total complexity of Algorithm 13.11 becomes 

0((hlog 2 h + logn)M(n)). 

One of the asymptotically optimal algorithms for class polynomials relies on a 
related technique of symbolic computation (see [27l §10.1]). Let f(X) be a polyno- 
mial of degree d over C that is to be evaluated in h different arguments xi, . . . ,Xh- 
The key observation for doing so fast is that f(xi) is nothing but f(X) mod X — Xi. 
So the first step of the algorithm is the construction of the binary tree T^^ as in 
Algorithm [371] containing products of more and more of the X — .t,;. Then a match- 
ing tree Rk.i = f mod is computed in the converse order, from its root to its 
leaves, such that the leaves contain the desired values of /. 

Algorithm 3.2 (Multi_eval). 

Input: h G N. / £ C[A] and x\, . . . , Xh G C 

Output: f(xi), f(x h ) 

(1) T <— Poly_f rom_roots (h, X\, . . . , Xh) 

(2) t 4- riog 2 h] , Rt,i «- / mod T M 
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(3) for k = t — 1 downto 

fori = l to 2 t - k 

Rk,i *— R k | 1,1 m °d Tk,i 

(4) return R Q>1 , . . . ,-R ,ft 

A division with remainder of a polynomial of degree d by a polynomial of degree 
d' has the same complexity as multiplying polynomials of degree bounded by d 
( |27[ §9.1]). The algorithm first computes an inverse modulo some power of X by 
Newton iterations and obtains the quotient with one multiplication; the remainder 
then requires another multiplication. So after the first reduction of / modulo the 
product of all the X — X4, the complexity of Algorithm 13 . 21 on n-bit numbers is the 
same as that of Algorithm 13. II Including the first reduction, it is given by 

0((dlogd + h\og 2 h + log n)M(n)). 

4. The height of class polynomials 

The running time of the algorithms working with floating point approxima- 
tions depends crucially on the precision n. To be able to round the approximated 
class polynomial to a polynomial over the integers (Z or the maximal order of an 
imaginary-quadratic number field), n has to be at least the bit size of the largest 
coefficient, or otherwise said, the logarithmic height of the polynomial. In this sec- 
tion, we prove Theorem 1 1.21 by developing an explicit upper bound without hidden 
constants for the height of the class polynomial for j. The result is of independent 
interest; for instance, it allows to bound the precision also for the p-adic algorithms, 
which yields a proof of correctness for their output. 

As shown in [21] , the height of class polynomials for a different / changes asymp- 
totically by a constant factor, depending on the degrees in / and j of the modular 
polynomial connecting the two functions. So this section proves an asymptotic 

bound of O (yfI)[log 2 for the height of any class polynomial. To obtain a 
more explicit bound for invariants other than j, the techniques of this section may 
be used with the necessary adaptations. 

Let us first give the intuitive basis for the bound. Since the values of j are 
usually large, more often than not the largest coefficient of the class polynomial is 
its constant one. Then approximating j(r) = q^ 1 + 744 + Yl^Li ^'f by the first 
term, q^ 1 , one obtains a heuristic estimate for the height as 

[A,B,C] 

where the sum is taken over all reduced primitive quadratic forms of discriminant 
D. This heuristic estimate is very well confirmed by experimental findings, see [21j . 

Turning this idea into an explicit bound is mainly a matter of computations that 
are sketched in the following. 

The reducedness of the quadratic forms is equivalent to the r-values lying in the 
standard fundamental domain T for the action of SI2 (Z) on H, 
(4.1) 

T = <z G H : \z\ > 1,~ < < H U jz G H : \z\ = 1,~ < »(«) < j . 
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Then \q\ < e 2 ™ i± ^ = e^ 3 . The upper bound c„ < of ;8j then yields 

°° r- 
W) -q- Y \< 744 + J2 ~M^l e = kl = 2114 - 566 • ■ ■ 

V — 1 V 

for r in the fundamental domain J 7 , so that 

|j(r)|<|g- 1 |+fc 1 <fc 2 | (? - 1 | 

with k 2 = 1 + fcre -7 ^ = 10.163 . . . 

Let the A; be numbered in increasing order. Then the logarithm of the absolute 
value of the coefficient in front of is bounded above by 

iog (0) n^kr 1 !)) <io g (Vfc£nk ri i) <\o g (2k 2 )h+n^\j2± 

independently of j. 

The next step consists of estimating Yli=i ^ is P rove d m [HI Lemma 2.2] 
that Yli=i t: ^ (7(log 2 \D\). We shall derive a bound that makes the involved 
constants explicit. Consider the number of possible B for a given A. This number 
is certainly bounded above by the number of B e] — A, A] that satisfy B = D mod 2 
and = o 

mod A. Assuming the worst case that the quadratic equation has a 
root modulo each prime divisor of A and considering the cases of odd and even A 
separately, one obtains an upper bound on the number of B of 2 ■ WW < 2t(A), 
where w(A) denotes the number of prime factors of A and t(A) its number of 
divisors. Hence, 



i=l ' A=l 

and this sum may be bounded by standard techniques from analytic number theory. 
We use the estimates 

N 1 1 
(4.2) i giV + 7 <]r_ <log7v+ 7 + — 

■'— ' n 2J\ 

n—l 



with Euler's constant 7 = 0.577 . . . and 

„ o1 ^logn log2 f N+1 logt , 1, 2 AT 

( 4 - 3 ) E^-^^-+/ -^dt>-log 2 N 



n ~ 2 Jo t ~ 2 

n=l J6 
,.2 



with k 3 = \ (log 2 3 - log 2) = 0.256 ... We have 

N , N N \_N/m\ 

2 yi^l = 2 y y J_ = 2 y i = 2 yl y 1 

^— ' -4 ' ^— ' 77777 77777 ^— { 111 ' 77 



A— 1 A— 1 Km.,n:mn=A Km.n:mn<N rn—1 n=l 



< 2^- log-+ 7 +-^v 

^— ' 777 \ 777 } 

m— 1 \ 



2|-| 

LmJ 



IV 



9 o lOg N + 7 V— -v I 

< log 2 jV + 4 7 logiV + 27 2 + & y +2 fc 3 +^^ rr 

m=l LmJ 

by flO) and gSJ) 
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Using (14. 2j) . the last term of this sum can be bounded by 



N L"tJ N N 

L, " J m=1 Vm ; ^=[fj+i ™=Tfl 



Combining the inequalities, the height of the class polynomial is bounded from 
above by 

(4.4) k 5 h + ir^\D\ ^log 2 jV + 4 7 logjV+ lo S N + l + 1 

with N = y/M, fc 4 = 2fc 3 + 2 log 2 + 2 7 2 = 2.566 ... and k 5 = log(2fc 2 ) = 3.011 . 
Similarly to the argumentation above one shows that 

N 

(4.5) h<2^2r(A) < 2A(log A + 7) + 1, 

which also implies ft 6 O f log 1 1? 

Combining (j4.4[) and (|4.5[) yields the bound of Theorem 11.21 

5. Class group enumeration 

Computing the class group of Od by enumerating all reduced primitive quadratic 
forms of discriminant D is a step of the algorithm that can be neglected in practice. 
In the new algorithms of Sections 16.31 and 16.41 however, its theoretical complexity 
risks to come close to that of the crucial parts. One has to determine all coprime 
[Ai, Bi, Ci) such that \Bi\ < Ai < Cj, Bi > if one of the inequalities is not strict, 

and D = Bf — AAiCi. These conditions imply that Ai < 

5.1. The na'ive algorithm. The following trivial algorithm is implemented in 
the code described in more detail in Section [7] Loop over A and B such that 

< B < A < J ^ and B = D (mod 2) and compute the corresponding C, if 
it exists. While it can be seen in the figures of Table Q] that this approach takes 
virtually no time, it carries out 0(|-D|) arithmetic operations with integers of bit 
size in 0(log \ D\) and thus has a complexity of 

0(|£>|M(log|D|)), 

which is smaller than the bound of Corollary 11.31 only by logarithmic factors. 

5.2. Saving one loop. An asymptotically faster (and again up to logarithmic 
factors optimal) algorithm is inspired by the theoretical considerations of Section |U 

Looping over A in the interval 1 < A < \J one solves the congruence B = 

D mod 2 and B ~ D = mod A, and keeps the form if C = B 4 ~ A D > A. As shown 
in Section QJ the total number of B to consider is in 0{^\D\ log \D\). 

To find the square root of D modulo A, the prime factorisation of A is required, 
and the most convenient approach is to enumerate the A in factored form. To 

this purpose, one needs the primes up to \/^-, which can be obtained in time 
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O f i^g i g pi j by [2\. All possible A arc then computed with an amortised cost 

of one multiplication per value in total time 0(^/\D\M(\ag \D\)). 

For any given A and prime p\A, a square root of D modulo p is computed in 
0(\ogp M(\ogp)) by Cipolla's algorithm [TT], lifting to a root modulo p e for p e \\A 
requires an additional 0(e M(logp)). For one A, this takes 

O ]T log(p e )A/(logp) C 0(log AM (log A)) C 0(log \D\ M(log \D\)), 
\p°\\A J 

or O (j\D\log \D\ M(log |Z>|)) for all A. 

The roots modulo prime powers have to be recombined by the Chinese remainder 
theorem to form a candidate B. Organising the computations in a tree with the 
leaves indexed by the p e ||A, one may use the analogues for numbers of the fast 
algorithms for polynomials of Section [3721 The tree has uj(A) S 0(log A) leaves and 
thus a height of (9(loglog A), and the root contains a number of 0(log A) digits, so 
that one computation of Chinese remainders takes 

0(log log A M (log A) ) . 

This is to be multiplied by the total number of B, which yields an overall complexity 

of 

0(^\D\ log |D| log log \D\ M(\og \D\)). 

So Chinese remaindering is the dominant step of this algorithm for class group 
enumeration, and the first bound of Theorem 11.41 is proved. 

5.3. Generating the class group by primes. Another possible approach is to 
start with a set of prime forms generating the class group. According to [H p. 376], 
under GRH a system of generators is given by the forms having a prime A-value 
bounded by 6 log 2 | D\ , and these can be computed in time polynomial in log \ D\. Let 
pi, p2, ■ ■ • denote these generators. One may enumerate the powers of pi in the class 
group by composing and reducing quadratic forms until reaching the order e\ of pi 
such that p^ 1 = 1. Next, one computes the powers of p2 until p^ 2 lies in the subgroup 
generated by pi, constructed in the previous step; then all combinations p^p^ 2 with 
< cij < ej are added to yield the subgroup (pi, p 2 ). One continues with the powers 
of p 3 until P3 3 falls into this subgroup, and so forth. If all computed elements are 
stored in a hash table or simply in a matrix indexed by A and B, looking them 
up takes negligible time. The running time of the algorithm is dominated by 0(h) 
computations in the class group, each of which takes (9(loglog \D\ M(log \D\)) by 
[40] . This proves the second bound of Theorem 11.41 

5.4. From class groups to A^-systems. When working with a class invariant 
other than j that is not invariant under Sl2(Z), but only under T (N) for some 
N > 0, the representatives of the class group need to be normalised to obtain a 
coherent set of algebraic conjugates. Such a normalisation is, for instance, given by 
an A-system as defined in 38 . It has the property that 

gcd(Ai,N) = 1 and B, = B x (mod 2 A) for 1 < % < h 

and may be obtained by applying suitable unimodular transformations to the orig- 
inal [A4, Bi, Cj]. The coefficients of these transformations are defined modulo A, 
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so that for fixed N, trying all possibilities requires a constant number of arithmetic 
operations per form and increases the size of the Ai, Bi and C, by a constant factor. 
Thus, transforming a system of reduced quadratic forms into an iV-system requires 
an additional 0(h M (log |-D|)), which is covered by the previous enumeration of 
the forms. (In practice, one would use a more intelligent approach to lower the 
complexity with respect to N, cf. the constructive proof of Proposition 3 in [38].) 

6. Complexity of class polynomial computation 

We are now able to provide the generic complexity for class polynomial computa- 
tion via floating point approximations. Different approaches to evaluating the class 
invariants lead to algorithms with different overall complexities; these are examined 
below. 

Let n be the precision in bits used for the computations, let h be the class number 
of On and denote by E(h,n) the time needed to evaluate the class invariant with 
precision n in h values. Then the algorithm takes 

• 0(h 1+e ) for enumerating the class group according to Section 

• E(h, n) for evaluating the class invariant and 

• 0((Mog h + log n)M(n)) for reconstructing the class polynomial from its 
roots according to Algorithm 13. 11 

The class group computation is indeed negligible since E(h, n) is at least of 
order hn, which is the time needed to write down the conjugates. So it remains to 
examine in more detail the quantity E(h, n). 

We propose four different algorithms for evaluating modular functions. The first 
two of them are well-known, the third one is a novel application of the techniques of 
symbolic computation presented in Section 13.21 and it already allows to obtain the 
complexity stated in Corollarv ll.31 The fourth one gains an additional logarithmic 
factor for the evaluation phase and yields the slightly more precise statement of 
Theorem 11.11 without changing the conclusion of Corollary 11.31 

The class polynomial Hjj[f] = Yl i= i(X — /(n)) is obtained by evaluating the 
function / in the h different arguments r< = ~^2A~^ ^ or an A^-system [j4,-, B,-, Cj] 
(see Section l3~2j) . where N depends on / and is assumed to be fixed. Since f(z) is 
modular for T (N), it is invariant under the translation z i— > z + N and admits a 
Fourier transform, that is, a Laurent series expansion in the variable q 1 ^ = e 2mz / N . 
Thus, the /(tj) may be obtained by first computing the corresponding q X J N and 
then evaluating the g-expansion in these arguments. By Section |3~T1 the can be 
computed in time 0(hlogn M(n)), which will be dominated by the actual function 
evaluations. 

6.1. The naive approach. The straightforward technique for evaluating the mod- 
ular function / = Y1u=l> c u (l 1 ^) consists of a Horner scheme for the polynomial 
part YTJ=o° c v+vo (q 1 ^) an d a multiplication by (q 1 ^)" . (Notice that usually 
uq < 0.) Its complexity is 0((yi — + logi'o)M(n)), the c„ having been precom- 
puted to precision n. Actually, vq is a constant depending only on /; v\, however, 
depends not only on the c v , but also on the desired precision n and on jq 1 ^) and 
thus on the function argument. 

Consider first the classical case of / being j, which is invariant under SL)(Z). We 
may then assume that the arguments are transformed by a matrix in SLj (Z) into the 



COMPLEXITY OF FLOATING POINT CLASS POLYNOMIAL COMPUTATION 



11 



standard fundamental domain T of (|4.1[) prior to evaluating j, so that \q\ < e " 3 
is bounded from above by a constant less than 1. On the other hand, it is shown in 
[5] that < c v < -ji^ji for v > 1. Thus, it is possible to fix v\ £ 0(n) to obtain 
a precision of 0(n) digits. 

The total complexity of evaluating j at h values then becomes 

0(hnM(n)), 

or 

O (\D\ log 3 \D\ M (v^log 2 |£>|)) C O {\Df 2 log 6+£ |£>|) 

with neO (y\D\ log 2 \D\j and h e O (y/\D~\ log \D\ \ according to Section H 

Concerning alternative class invariants, unfortunately the fundamental domain 
for T°(N) with N > 1 contains at least one rational number (called a cusp). In such 
a cusp, \q\ — 1, and the g-expansion usually diverges; in a neighbourhood of the 
cusp, it may converge arbitrarily slowly. In this case, it is possible to use a different 
expansion in the neighbourhood by transporting the cusp to infinity via a matrix 
in Sl2(Z). We will not pursue this discussion, since the approach of Section IBT21 
provides a faster and simpler solution for all currently used class invariants. 



6.2. Using the sparsity of 77. Virtually all class invariants suggested in the liter- 
ature are in some way derived from Dedekind's 77-function. This is the case for the 

Weber functions f(z) = e"" Z 24 = and f 2 (z) = ^2 ^ already 

>7( — 1 

examined in the generalised Weber functions Wn(z) — Vv suggested in |22j . 

Jf.)Jf-) 11 
the double 77 quotients rD Pl P2 (z) = v pi / v p2 / proposed in 123] , and even for j. In 

fact, j is most conveniently computed as j = f ^-p-^ . 

The definition of 77 in [15] is closely related to the partition generating function: 



= „V24 

iy>l 



evaluating the product to precision n requires 0(n) arithmetic operations. An 
expression better suited for computation is given by Euler's pentagonal number 
theorem 1251: 



/ 00 



Since the occurring exponents are values of quadratic polynomials, the series is 
very sparse: To reach an exponent of order 0(n), only 0(\fn) terms need to be 
computed, and this process can be implemented with 0(y/n) multiplications. Recall 
that any polynomial of fixed degree can be evaluated in an arithmetic progression 
with a constant number of arithmetic operations per additional value, once the first 
few values are known; the employed algorithm relies on iterated differences. In the 
special case of 77, the following recursion yields two additional terms of the series at 
the expense of four multiplications by recursively computing q v , q 2u ~ x , q v{ ? v ~ x )l 2 
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and q ^+i)/2 as follows . 

q v(3u-l)/2 = (!/-l)(3(i/-l)+l)/2 , 2v-l 
q v{Zv+l)/2 = v (3v-l)/2 . u 

Besides the sparse and regular series expression, the r\ function has a second 
crucial property that makes it well suited for computation: It is a modular form of 
weight 1/2. As such, unlike j, it is not invariant under transformations in Sl2(Z). 
However, its transformation behaviour is explicitly known (cf. [171 §4]) and easily 
computable. Thus, to obtain rj(z) for an arbitrary value of z, one should first 
transform z into the fundamental domain J-, so that the series can be truncated 
at an exponent of order 0(n). Then the evaluation of 0(y/n) terms of the r\ series 
for O(h) distinct values (the constant being at most 4 for the ro PliP2 mentioned in 
the beginning of this section) with a floating point precision of 0(n) digits can be 
carried out in time 

0(h«/riM(n)), 

or 

O (|£| 3/4 log 2 \D\ M (^\D\\og 2 \D\)) C O [\D\^ A log 5+£ \D\) 

asheO (^y/\D~\log |Z>Q and n G O \J\D\ log 2 \D\\ according to Sectiongl 

It remains, however, to verify that transforming the arguments into the funda- 
mental domain is dominated by the cost of the series evaluation. The arguments be- 
ing roots of an iV-system [A % , B l , C t ] with A. t , \Bi\ e 0{N 2 y/\D~\) = 0{y/\D\), they 
may be transformed into T by reducing the quadratic forms in time 0(h M(log \D\)) 
(see [121 Prop. 5.4.3]), which is negligible. The same holds for arguments such as 
or jt, corresponding to quadratic forms whose discriminants have absolute 
values in 0(|D|). 

6.3. Multipoint evaluation. The algorithms of Sections 16.11 and \6 . 21 compute the 
values of modular functions one at a time; but for the sake of class polynomial 
computation, we need the values in many points. For polynomials, Algorithm 13.21 
provides a fast way of doing exactly this. And indeed, from a numerical point of view 
a class invariant can be seen as a polynomial via its truncated g-expansion. Either 
one considers the function directly as done for j in Section [6. 11 or one proceeds via 
r\ as in Section RT21 In both cases, a polynomial of degree 0(n) has to be evaluated 
in 0(h) points, which by Algorithm 13.21 can be done in time 



O ((nlogn + h\og 2 h)M(n)) C O (y/\D\ log 3 \D\ M 
QO{\D\\og^\D\). 



6.4. Newton iterations on the arithmetic-geometric mean. A new approach 
for evaluating modular functions in single arguments is described in [18j . It is based 
on the arithmetic-geometric mean and Newton iterations on a function involving it. 
The basic algorithm underlying [18| Theorem 4] computes the modular function k' , 
whose square A satisfies 

256 (1 -A + A 2 ) 3 
(A(l-A)) 2 J ' 
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For an argument with imaginary part bounded by a constant and the precision n 
tending to infinity, it has a complexity of 0(log n M(n)). 

During class polynomial computations, the precision and the imaginary part of 
the arguments are tightly coupled, so that this algorithm is not sufficient to derive 
the desired complexity result. The modification of [18l Theorem 5] obtains the 
same complexity of 0(logn M(n)) uniformly in the argument. If the imaginary 
part of the argument is of the order of the required precision, then the algorithm 
of Section 16.21 or even the naive algorithm of Section 16. f I already yield the desired 
result with a constant number of arithmetic operations. Otherwise, the argument 
is repeatedly divided by 2 until its imaginary part is smaller than a constant, which 
can be compensated by iterations of the arithmetic-geometric mean. Then the 
previous Newton algorithm converges sufficiently fast. 

For other modular functions /, one may have the evaluation of k' followed by 
Newton iterations on the modular polynomial relating kl and /. For fixed /, this 
phase does not increase the complexity. The approach does not work, however, 
for r], which is a modular form of weight 1/2 instead of a modular function (of 
weight 0). The algorithm of [TBI Section 7.2] computes first 6*g , a certain modular 
form of weight 1, as the inverse of the arithmetic-geometric mean of 1 and k' , and 
then r\ as the twelfth root of A(l — A)#g /16 by a suitably initialised Newton process. 
Again, one obtains a complexity of 0(lognM(n)) for an evaluation at precision n, 
uniformly in the argument. 

The complexity for evaluating in h arguments then becomes 

0(h\ognM(n)) C O (^/\D~\\og 2 \D\ M (y/\D\ log 2 |Z?|)) C O (\D\ log 5+£ \D\) 

with the estimates of Section 0] for h and n. Taking into account the time needed 
to compute the class polynomial from its roots by Algorithm 13. f I this proves The- 
orem [O] 

7. IMPLEMENTATION 

The algorithms of this article have been implemented using gmp [32 with an 
assembly patch for 64 bit AMD processors , mpfr [33J and mpc [21] for the 
multiprecision arithmetic and mpfrcx J2U] for the polynomial operations. Table [T] 
provides running times for class numbers between 2500 and 100000, obtained on 
an AMD Opteron 250 with 2.4 GHz. All timings are given in seconds and rounded 
to two significant digits. (The computations for class number 100000 have been 
carried out on a 2.2 GHz machine disposing of more memory, and the running 
times have been scaled accordingly.) For each class number, the discriminant with 
smallest absolute value has been chosen. Only the algorithms of Sections 16.21 to 16.41 
are taken into account; the naive approach of Section 16.11 is clearly inferior to the 
one exploiting the sparsity of r\. The chosen class invariant is the double ^-quotient 
t^3.i3j and its values are obtained by precomputing a table for the values of r\ at 
the h reduced quadratic forms. 

The first lines of the table provide some general information. The precision (1) of 
the floating point computations is obtained by increasing the estimate of Section 0] 
by 1% to account for potential rounding errors. As the chosen class invariant is 
not j, a correction factor depending on the class invariant needs to be used, see 
Section 2] This factor is correct only asymptotically, which explains why the actual 
height (2) is a bit smaller than the precision estimate. For j, the two are closer to 
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Table 1. Running times 



each other. M(n) is measured by computing the first 100000 successive powers of 
7T + 47 with Euler's constant 7. 

The second block of lines provides timings for the steps that are independent of 
the algorithm used for evaluating the modular function. As can be seen, the class 
group computation (4) is completely negligible. Line (5) corresponds to the effort 
of deriving all values of the class invariant from the tabulated r\ values (reduction 
of quadratic forms, multiplication by 24-th roots of unity and computation of the 
r\ quotients). The computation of the polynomial from its roots (6) corresponds 
precisely to Algorithm 13. H and provides a measure for the complexity of the oper- 
ations with polynomials. For constructing the largest polynomial of degree 100000, 
the polynomial FFT has been disabled during the last steps and replaced by Toom- 
Cook multiplication, since the FFT consumed too much memory; this explains the 
jump from h — 40000. 

The third block contains the timings for evaluating rj in the reduced quadratic 
forms (7) using the sparse series representation as described in Section l6~2l and the 
total running time for computing the class polynomials using this technique (9). 
Line (8) details the time spent in (7) (and also in (10)) for computing the it 
essentially measures the complex exponential. 

The fourth block represents the corresponding results for the multipoint evalua- 
tion approach of Section |6~31 and the last block corresponds to the asymptotically 
fastest evaluation of Section 16.41 for which an implementation by Dupont has been 
used. As explained in Section 16.41 the algorithm requires to switch to the sparse 
series evaluation when the imaginary part of the argument becomes too large. In 
the implementation, the AGM code is disabled for an imaginary part larger than 5. 

Comparing first the evaluation of 77 as a sparse series or by multipoint evaluation, 
one notices that the asymptotically faster algorithm is about 5 to 8 times slower 
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on the examples and that it appears to catch up with growing class numbers. 
However, this happens so slowly that one cannot expect it to beat the algorithm in 
O (l-DI 1 - 25 " 1 " 5 ) in the foreseeable future for any tractable instance. 

The approach using Newton iterations on the AGM is faster than multipoint eval- 
uation, but still hardly beats the asymptotically slower algorithm in O (|Z?| 125+e ) : 
The biggest computed example of class number 100000 lies just beyond the cross- 
over point! 

One notices that the growth rates of the running times of all algorithms, instead 
of behaving like \D\ or IZ?! 125 , come closer to l-Dl 1 ' 4 or A small part of this 

can be explained by the rather peculiar choice of discriminants. Taking the first 
one with a given class number, the precision n is rather large compared to h and 
•^/[Uj", since there are many forms with small A so that the sum Xh=i in becomes 
comparatively large. 

However, the major reason for the faster than predicted growth of computing 
time is that for the floating point arithmetic the range for asymptotically fast 
algorithms is not yet reached. The threshold for switching to the FFT in gmp 
on the test machine is set to about 500000 bits; so the examples lie still in the 
Karatsuba or Toom-Cook range, which accounts for a growth of M(n) of ?i los 3 / log 2 
resp. n lo s 5 / lo s 3 instead of n. 

A possible improvement of the multipoint evaluation approach consists of con- 
veniently grouping the arguments. For instance, in the example of class number 
5000, the function r\ has to be evaluated in 2501 arguments (corresponding to the 
two ambiguous forms and 2499 pairs of opposite non-ambiguous forms). Assuming 
the worst case of \q\ w e _7rv/ ^, that is almost reached for the largest values of A, 
one can approximate r\ by a polynomial of degree 1190. As multipoint evaluation 
should be most efficient when the number of arguments is about half the degree, 
it makes sense to perform four evaluations in 625 resp. 626 arguments each. But 
sorting them by their absolute values, the smaller ones do not actually require such 
a high degree approximation of r\. In the example, an approximation of degree 260, 
551, 805 resp. 1190 is sufficient for the four chunks of q. Then the time used for 
multipoint evaluation drops from 93 s to 64 s. Experimenting with different par- 
titions of the arguments (three resp. five parts of the same size, parts of different 
sizes adapted to the degree of the approximations, etc.) yields similar results, far 
from competing with the sparse series evaluation. 

Another point to take into account are the space requirements of the algorithms. 
When each root of the class polynomial is computed separately, only 0(hn) bits 
need to be stored, which is linear in the output size. Multipoint evaluation as 
described in Section 13.21 however, requires that the tree constructed in Step 1 of 
Algorithm 13.21 be maintained in memory, so that the occupied space grows by a 
logarithmic factor to reach 0(hn\ogh). This logarithmic factor could be saved by 

evaluating in chunks of O (j^j^j arguments, as explained in Lemma 2.1 of [28] , 

Anyway, the example class polynomial of degree 100000 uses over 5 GB as an 
uncompressed text file and is computed in about 3 days. This shows that the 
limiting factor is the memory requirement rather than the running time, as can be 
expected from algorithms that have a close to linear complexity with respect to 
their output size. 
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As a final remark, one notices that the algorithms behave numerically well, even 
though rounding errors do occur during floating point computations. For the algo- 
rithm of Section |6.2[ this can be explained by the sparsity of the rj series and the 
fact that all coefficients are +1 and — 1. Indeed, if the last few digits of a term 
are erroneous, these errors propagate to subsequent terms. However, the absolute 
magnitude of such errors decreases rapidly, so that the wrong digits in later terms 
actually do not intervene in the additions. (Otherwise said, the computations may 
as well be carried out with fixed point numbers, and indeed a simulation of fixed 
point arithmetic using floating point numbers of decreasing precision yields accurate 
results.) 



8.1. Chinese remaindering. In [T], the authors suggest an approach for directly 
computing class polynomials for j modulo a prime p. The basic idea is to derive the 
polynomial modulo many small primes by enumerating all elliptic curves modulo 
these small primes and only retaining those having complex multiplication by Od ■ 
Then a Chinese remainder technique allows to obtain the class polynomial modulo p. 

Unfortunately it is not sufficient to gather only O(logp) bits of information 
modulo small primes per coefficient of the class polynomial, although this is the 
information contained in the final output. In fact, so many small primes are needed 
that the class polynomial could be reconstructed over Z instead of only modulo p. 
The complexity derived in Section 3.2 of [1] is 



and already the term depending only on \D\ is worse than what is obtained with 
the algorithms of Section [5] 

Thus, it is asymptotically faster to compute the class polynomial over Z using 
floating point approximations and to reduce it modulo p afterwards. 

8.2. p-adic algorithms. Couveignes and Henocq suggest in [T5] ap-adic approach 
for computing class polynomials for j. The basic idea is to look for a small prime p 
and (by exhaustive search) an elliptic curve modulo p with complex multiplication 
by Od ■ This curve is then lifted to the p-adic numbers with a high enough precision 
so that the class polynomial may be reconstructed. Alternatively, they show how 
to work with supersingular curves. The complexity of their approach is 0(|-D| 1+e ), 
where the exact power of the logarithmic factor has not been worked out. By nature, 
the algorithm is not affected by rounding errors, and using the explicit bound of 
Theorem II .21 its output is certified to be correct. 

So the complex and the p-adic approach are both essentially linear in the size 
of the class polynomials. All variants have been implemented (for more details on 
the complex implementation, see [21]; for the ordinary p-adic algorithm, see |10j : 
for the supersingular one, |35j). and all seem to work reasonably well in practice. 
The floating point algorithms are easy to implement with arbitrary class invariants 
using the results of [38j . The p-adic approach with ordinary curves has been made 
to work with certain class invariants other than j, see [10] and [5J Chapter 6]. The 
considerable overhead involved makes it unclear whether it is competitive with the 
complex approach. 



8. Comparison to other approaches 
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